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Abstract: The problem of network coding for multicasting a single source to 
multiple sinks has first been studied by Ahlswede, Cai, Li and Yeung in 2000, 
in which they have established the celebrated max-flow mini-cut theorem on 
non-physical information flow over a network of independent channels. On 
the other hand, in 1980, Han has studied the case with correlated multiple 
sources and a single sink from the viewpoint of polymatroidal functions in 
which a necessary and sufficient condition has been demonstrated for reliable 
transmission over the network. This paper presents an attempt to unify both 
cases, which leads to establish a necessary and sufficient condition for reliable 
transmission over a network for multicasting correlated multiple sources to 
multiple sinks. Here, the problem of separation of source coding and network 
coding is also discussed. 



Index terms: network coding, multiple sources, multiple sinks, correlated 
sources, entropy rate, capacity function, polymatroid, co-polymatroid, mini- 
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1 Introduction 



The problem of network coding for multicasting a single source to multiple 
sinks has first been studied by Ahlswede, Cai, Li and Yeung pQ in 2000, in 
which they have established the celebrated max-flow mini-cut theorem on non- 
physical information flow over a network of independent channels. On the 
other hand, in 1980, Han [3J had studied the case with correlated multiple 
sources and a single sink from the viewpoint of polymatroidal functions in 
which a necessary and sufficient condition has been demonstrated for reliable 
transmission over a network. 

This paper presents an attempt to unify both cases and to generalize it to quite 
a general case with stationary ergodic correlated sources and noisy channels 
(with arbitrary nonnegative real values of capacity that are not necessarily 
integers) satisfying the strong converse property (cf. Verdii and Han [6], Han 
[1]), which leads to establish a necessary and sufficient condition for reliable 
transmission over a noisy network for multicasting correlated multiple sources 
altogether to every multiple sinks. 

It should be noted here that in such a situation with correlated multiple 
sources, the central issue turns out to be how to construct the matching condi- 
tion between source and channel (i.e., joint source-channel coding), instead of 
of the traditional concept of capacity region (i.e., channel coding), although in 
the special case with non- correlated independent multiple sources the problem 
reduces again to how to describe the capacity region. 

Several network models with correlated multiple sources have been studied by 
some people, e.g., by Barros and Servetto [S], Ho, Medard, Effros and Koetter 
[13] , Ho, Medard, Koetter, Karger, Effros, Shi and Leong [T3], Ramamoorthy, 
Jain, Chou and Effros [15] . Among others, [15] . pH] and [15] consider (without 
attention to the converse part) a very restrictive case of error-free network 
coding for two stationary memoryless correlated sources with a single sink to 
study the error exponent problem, where we notice that all the arguments in 
[13] . |14j and [H] can be validated only within the narrow class of stationary 
memoryless sources of integer bit rates and error-free channels (i.e., the identity 
mappings) all with one bit (or integer bits) capacity (these restrictions are 
needed solely to invoke "Menger's theorem" in graph theory). The main result 
in the present paper is quite free from such severe restrictions, because we can 
dispense with the use of Menger's theorem. 

On the other hand, [9] revisits the same model as in Han [3J, while [15] focuses 
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on the network with two correlated sources and two sinks to discuss the sepa- 
ration problem of distributed source coding (based on Slepian-Wolf theorem) 
and network coding. It should be noted that, in the case of networks with cor- 
related multiple sources, such a separation problem is another central issue, 
although it is yet far from fully solved. In this paper, we mention a sufficient 
condition for separability in the case with multiple sources and multiple sinks, 
(cf. Remark [5J2]). 

On the other hand, we may consider another network model with indepen- 
dent multiple sources but with multiple sinks each of which is required to 
reliably reproduce a prescribed subset of the multiple sources that depends 
on each sink. However, the problem with this general model looks quite hard, 
although, e.g., Yan, Yeung and Zhang [Tl] and Song, Yeung and Cai [12] have 
demonstrated the entropy characterizations of the capacity region, which still 
contain limiting operations and are not computable. Incidentally, Yan, Yang 
and Zhang [22] have considered, as a computable special case, degree-2 three- 
layer networks with X-pairs transmission requirements to derive the explicit 
capacity region. In this paper, for the same reason, we focus on the case in 
which all the correlated multiple sources is to be multicast to all the multiple 
sinks and derive a simple necessary and sufficient matching condition in terms 
of conditional entropy rates and capacity functions. This case can be regarded 
as the network counterpart of the non-network compound Slepian-Wolf system 

We notice here the following; although throughout in the paper we are encoun- 
tered with the subtleties coming from the general channel and source charac- 
teristics assumed, the main logical stream remains essentially unchanged if we 
consider simpler models, e.g., such as stationary correlated Markov sources 
together with stationary memoryless noisy channels. This means that con- 
sidering only simple cases does not help so much at both of the conceptual 
and notational levels of the arguments. For this reason, we preferred here the 
compact general settings. 

The present paper consists of five sections. In Section [2] notations and pre- 
liminaries are described, and in Section [3] we state the main result as well as 
its proof. In Section |4] two examples are shown. Section [5] provides another 
type of necessary and sufficient condition for transmissibility. Finally, some 
detailed comments on the previous papers are given. 
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2 Preliminaries and Notations 



A. Communication networks 

Let us consider an acyclic directed graph G = (V, E) where V = {1, 2, • • • , \V\} 
(\V\ < +oo), E C V x V, but (i, i) E for all i G V. Here, elements of V are 
called nodes, and elements of E 1 are called edges or channels from i to j. 
Each edge is assigned the capacity Cij > 0, which specifies the maximum 
amount of information flow passing through the channel If we want 

to emphasize the graph thus capacitated, we write it as G = (V, E, C) where 
C = {cij)(i,j)eE- A graph G = (V, E, C) is sometimes called a (communication) 
network, and indicated also by M = (V, E, C). We consider two fixed subsets 
^ of V such that $ n * = (the empty set) with 

$ = {si,s 2 , - ■ ■ ,s p }, 
* = {ti,t 2 , ■■■,*«}, 

where elements of <I> are called source nodes, while elements of are called 
sink nodes. Here, to avoid subtle irregularities, we assume that there are no 
edges (i, s) such that s G <&. 

Informally, our problem is how to simultaneously transmit the information 
generated at the source nodes in <I> altogether to all the sink nodes in More 
formally, this problem is described as in the following subsection. 

Remark 2.1 In the above we have assumed that n = 0. However, we 
can reduce the case of $ n ^ ^ to the case of <3> n * = by equivalently 
modifying the given network. In fact, suppose PI * / and let k <G $ n ^ 
for some fc. Then, we add a new source node k' to and generate a new edge 
(k', k) with capacity oo, and remove the node k from <&. Repeat this procedure 
until we have $ f~l * = 0. The assumption that there are no edges (i, s) such 
that s G $ also can be dispensed with by repeating a similar procedure. □ 

5. Sources and channels 

Each source node s£$ generates a stationary and ergodic source process 

X S = (X^,XP,-..), (2.1) 

where X^p (i = 1, 2, ■ ■ ■) takes values in finite source alphabet X s . Throughout 
in this paper we consider the case in which the whole joint process X§ = 
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(X s ) sg $ is stationary and ergodic. It is then evident that the joint process 
Xt = (X s ) s< =.t is also stationary and ergodic for any T such that / T C $. 
The component processes X s (s £ <£) may be correlated. We write Xt as 

X T = (4 1) ,4 2) ,-..) (2.2) 

and put 

yn / yW v( 2 ) v( n )^ in 

A T — (A T ,A T ,---,A T J, (Z.dJ 

where (z = 1, 2, • • •) takes values in Xx = FLer ^s- 

On the other hand, it is assumed that all the channels € E, specified by 
the transition probabilities Wij : Afj — > Bfj with finite input alphabet Aij and 
finite output alphabet By, are statistically independent and satisfy the strong 
converse property (see Verdii and Han [B]). It should be noted here that sta- 
tionaty and memoryless (noisy or noiseless) channels with finite input/output 
alphabets satisfy, as very special cases, this property (cf. Gallager [7|, Han 
[1]). Barros and Servetto [9] have considered the case of stationary and mem- 
oryless sources/channels with finite alphabets. The following lemma plays a 
crucial role in establishing the relevant converse of the main result: 

Lemma 2.1 (Verdii and Han [6]) The channel capacity Cjj of a channel un- 
satisfying the strong converse property with finite input/output alphabets is 
given by 

en = lim -max/(X n ;Y n ), 

n— s-oo n X n 

where X n , Y n are the input and the output of the channel , respectively, 
and I(X n ;Y n ) is the mutual information (cf. Cover and Thomas [8]). □ 

C. Encoding and decoding 

In this section let us state the necessary operation of encoding and decoding 
for network coding with correlated multiple sources to be multicast to multiple 
sinks. 

With arbitrarily small 5 > and e > 0, we introduce an (n, (Rij)^j^E-, 
code as the one as specified by (|2.4p ~ (|2,9p below, where we use the notation 
[1,M] to indicate {1,2, ■ ■ ■ , M}. How to construct a "good" (n, (-Rjj)(i,i)eSi 
5, e) code will be shown in Direct part of the proof of Theorem 13.11 

1) For all (s,j) (s G $), the encoding function is 

f sj :X?^[l,2 n(Rsj - 5) }, (2-4) 
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where the output of f s j is carried over to the encoder (p s j of channel w s j, 
while the decoder ip s j of w s j outputs an estimate of the output of f s j, which 
is specified by the stochastic composite function: 

h sj = i> sj o w sj o <p aj o fsj : X? -> [1, 2 n ^~% (2.5) 

2) For all (i $), the encoding function is 

fiji JJ [l,2 n{Rki ~ 5) ] -»• [l,2 n ^-*)], (2.6) 

k:(k,i)eE 

where the output of /jj is carried over to the encoder <pij of channel lUjj, 
while the decoder ipij of Wij outputs an estimate of the output of , which is 
specified by the stochastic composite function: 

hij^frjowijcxpijofij: JJ [l,2"( fl «-*)]->[l,2 n ^-«)]. (2.7) 

k:(k,i)eE 

Here, if {k : (k, i) € E} is empty, we use the convention that is an arbitrary 
constant function taking a value in [1, 2 n ( Ri i~ s '>]; 

3) For all f€f, the decoding function is 

g t : JJ [l,2 n ( fl **-*>]-»-A^. (2.8) 
k:(k,t)eE 

4 ) Error probability 

All sink nodes t G $ are required to reproduce a "good" estimate X]£ 4 (= the 
output of the decoder g t ) of X$, through the network M = (V, E, C), so that 
the error probability Pv{XQ t ^ XQ} be as small as possible. Formally, for all 
f £ f, the probability \ n j of decoding error committed at sink t is required 
to satisfy 

\ n , t = Pr{X% >t + XI} < e (2.9) 

for all sufficiently large n. Clearly, XQ t are the random variables induced by 
X% that were generated at all source nodes s£$. 

Remark 2.2 In the above coding process, is applied before fi>j, is if % < i', 
and fij is applied before fy, is if j < f. Such an indexing is possible because 
we are dealing with acyclic directed graphs. This defines the order in which 
the encoding functions are applied. Since i < j if (i,j) £ E, a node does not 
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encode until all the necessary informations are received on the input channels 
(see, Ahlswede, Cai, Li and Yeung [T], Yeung [2]). In this sense, the coding 
procedure with the codes (n, (Rij)(i,j)eE> ^ e ) defined above is in accordance 
with the natural ordering on an acyclic graph. This observation will be fully 
used in the proof of Converse part of Theorem 13.11 in order to establish a 
Markov chain property. □ 

We now need the following definitions. 

Definition 2.1 (rate achiev ability) If there exists an (n, (Rij)^j^Ej ^> e ) code 
for any arbitrarily small e > as well as any sufficiently small 5 > 0, and for 
all sufficiently large n, then we say that the rate (Rij)(ij)^E is achievable for 
the network G = (V,E). □ 

Definition 2.2 (transmissibility) If, for any small r > 0, the augmented ca- 
pacity rate (Rij = Cij +t)^j^ €E is achievable, then we say that the source X$ 
is transmissible over the network M = (V,E,C), where Qj + r is called the 
r-capacity of channel (i, j). □ 

The proof of Theorem 13.11 (both of the converse part and the direct part) are 
based on these definitions. 

D. \-Typical sequences 

Let x$ denote the sequence of length n such as 

x<j> = ), ■ • ■ , a?0 ^) € X$. 

Similarly, we denote by x^ (0 ^ T C <£) the sequence such as 

x T = (xW),...,4 n) ) G *T. 

We set 

p(x T ) = Pr{X? = xt} 

and let H(Xt) be the entropy rate of the process Xt- With any small A > 0, 
we say that x$ G is a A-typical sequence if 



-log-i--#(X 5 ) 
n p(x 5 ) 



< A (0 / VS C (2.10) 



where xs is the projection of x$ on the S-direction, i.e., x$ = (xg,Xg) (5 1 is 
the complement of 5 in 3>). We shall denote by T\(X$) the set of all A-typical 
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sequences. For any subset 7^ S C <3?, let T\(Xs) denote the projection of 
Tx(X&) on Xg; that is, 

T X (X S ) = {x s G ^|(x S ,Xg) G T A (X $ ) for some G *|}. (2.11) 

Furthermore, set for any x^ G T\(X-g), 

T x (X s \*s) = {*S G #£|(x s ,x~) G T A (Xj,)}. (2.12) 

We say that xg is jointly typical with x^ if X5 G T\(Xs\x-g). Now we have 
(e.g., cf. Cover and Thomas [8]): 

Lemma 2.2 

1) For any small A > and for all sufficiently large n, 

Pr{Xg G T\(X$)} > 1 — A; (2.13) 

2) for any x^ G T\(X-§), 

\T x (X s \x^)\ < 2"W x sl%)+2A) ) ^ M ^ 

where ^(Xs'jX^-) = H(X$) — H(X-g) is the conditional entropy rate (cf. Cover 
[5]). Specifically, 

tf(X 5 |%) = lim -H{X n s \X^). 

□ 

This lemma will be used in the process of proving the transmissibility of the 
source Xj> over the network TV = (V, E, C). 

E. Capacity functions 

Let N = (V, E, C) be a network. For any subset M C V we say that (M, V\M) 
(or simply, M) is a cut and 

E M = {(i,j)eE\ieM,j eV\M} 

the cutset of (M, F \ M) (or simply, of M). Also, we call 

c(M,V\M)= J2 <*i ( 2 ' 15 ) 

(i,j)€E,ieM,j£V\M 
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the value of the cut (M, V\M). Moreover, for any subset S such that / S C 
(the source node set) and for any t € ^ (the sink node sets), define 



Pt(S) 



mm 

M:ScM,t£V\M 



c(M,V\M); 



(2.16) 



PAf(S) 



min pt(S). 



(2.17) 



We call this pj\f(S) the capacity function of S C V for the network = 



It is not difficult to check that cr(S) = H(Xs\X-g) is a co-polymatroid (see, 
Han [3 1). On the other hand, a set function p(S) on is called a polymatroid 
if it holds that 



It is also not difficult to check that for each t £ \t the function pt(S') in 
fl2~T6]) is a polymatroid (cf. Han [3], Meggido [23]), but PAf (S) in (pTTD ) 
is not necessarily a polymatroid. These properties have been fully invoked in 
establishing the matching condition between source and channel for the special 
case of |^| = 1 ( cf. Han [3]). In this paper too, they play a relevant role 
in order to argue about the separation problem between distributed source 
coding and network coding. This problem is mentioned later in Section [5] (cf. 



With these preparations we will demonstrate the main result in the next sec- 
tion. 

*In Zhang, Chen, Wicker and Berger [TB], the co-polymatroid here is called the contra- 
polymatroid. 




1) 
2) 
3) 



a(0) = 0, 

a(S)<a(T) (Sell, 

a(S n T) + a(S UT) > a(S) + a{T). 



1') 
2') 
3') 



p(0) = 0, 

p(S)<p(T) (SCT), 

P (SnT) + p(SuT)<p(S) + p(T). 



Remark [572]). 



□ 
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3 Main Result 



The problem that we deal with here is not that of establishing the "capacity 
region" as usual, because the concept of "capacity region" does not make sense 
for the general network with correlated sources. Instead, we are interested in 
the matching problem between the correlated source A$ and the network 
J\f = (V,E,C) (transmissibility: cf. Definition 12. 2p . Under what condition is 
such a matching possible? This is the key problem here. An answer to this 
question is just our main result to be stated here. 

Theorem 3.1 The source X<j> is transmissible over the network TV = (V, E, C) 



if and only if 

H(X s \Xs) < pm(S) (0^V5C$) (3.1) 

holds. □ 

Remark 3.1 The case of |^| = 1 was investigated by Han [3], and subse- 
quently revisited by Barros and Servetto [9], while the case of |<J?| = 1 was 
investigated by Ahlswede, Cai, Li and Yeung pQ. □ 



Remark 3.2 If the sources are mutually independent, (|3.ip reduces to 
^H(Xi)< w (5) (0^V5c$). 

Then, setting the rates as Ri = H{Xi) we have another equivalent form: 

^Ri<PAf(S) (0^VS'C#). (3.2) 

This specifies the capacity region of independent message rates in the tradi- 
tional sense. In other words, in case the sources are independent, the concept 
of capacity region makes sense. In this case too, channel coding looks like 
for non-physical flows (as for the case of |$| = 1, see Ahlswede, Cai, Li and 
Yeung pQ; and as for the case of |$| > 1 see, e.g., Koetter and Medard [IB] , Li 
and Yeung |TT] ) . It should be noted that formula ()3.2p is not derivable by a 
naive extension of the arguments as used in the case of single-source (|<3?| = 1), 
irrespective of the comment in pQ. □ 

Proof of Theorem \3.1\ 
1. Converse part: 
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Suppose that the source X$ is transmissible over the network Af = (V, E, C) 
with error probability X n> t = Pr{X^ t ^ X^} (t € under encoding functions 
fsj, fij and decoding functions gt- It is also supposed that \ n j — > ( n — > oo) 
with the r-capacity. 

Here, the input to and the output from channel (i.j) € E may be regarded as 
random variables that were induced by the random variable XQ = (X^ , • • • , X™ ) . 
In the following, we fix an element X77 G X^, where S is the complement of S 
in <3?. Set 

K,t(*s) = Pr {^,* + X 1\ XI E = x ^}> ( 3 - 3 ) 

then 

X n , t = Pr{X^ ^Xl) = Y, Pr{^f = ^}Kt (*<?)■ (3.4) 

x s 

For ^ S C <3? and t € let Mq be a minimum cut, i.e., a cut such that 

p t (S) = min{c(Af, V \ M)\S C M,t <E V \ M} 

= c(M ,V\M ), (3.5) 

and list all the channels (i, j) such that i £ Mo, j G V \ Mq as 

(ii,ii),--->(ir,Jr). (3-6) 

Furthermore, let the input and the output of channel (ik,jk) be denoted by 
Y™, Z£, respectively (k = 1, 2, • • • , r). Set 

y" = (y 1 n ,---,y™), z" = (zf,---^). (3.7) 

Since we are considering those codes (n,(Rij)^ j^ eE ,5,e) as defined by (|2.4|) 
~ (|2.9p in Section [2] on an acyclic directed graph (cf. Remark I2.2|) and hence 
there is no feedback, it is easy to see that XQ — > Y n — > Z n — > , (conditioned 
on X-^ = x^-) forms a Markov chain in this order. Therefore, by virtue of the 
data processing lemma (cf. Cover and Thomas [8j), we have 

I(Xl;Xl tt \^)<I{Y n -Z n \^). (3.8) 

On the other hand, noticing that X^ takes values in X^ x • • • x X™ and 
applying Fano's lemma (cf. Cover and Thomas [8]), we have 

p 

H(Xl\Xl^s) < l + nA^tCx^^loglAiJ =r t {n^S). (3.9) 

k=l 
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Hence, 

/(Xg;lg )t |xs) > H(X^) - r t (n,Xf,S). (3.10) 
From d3HD and (EHUD . 

ff(JCg|xg) < /(y n ;Z n |x^) +r t (n,X5,5). (3.11) 

On the other hand, since all the the channels on the network are mutually 
independent and satisfy the strong converse property, it follows by virtue of 
Lemma 12.11 that 



k=l 
k=l k 

< n V ( hm - max I(Y£; ZD + t) 

fe=l V fc 7 

r 

k=l 

= n( Pt (S) + 2rT) (3.12) 

for all sufficently large n, where the first inequality of ()3. 12|) follows from the 
property that all the channels are assumed to be mutually independent J3 

It should be noted here that we are now considering the r-capacity (cf. Def- 
inition [2]2|) . Thus, averaging both side of (|3.1ip and (|3.12p with respect to 
Pr{X^ = x^}, we have 

±H(X$\X%) < Pt (S)+r t (n,S). (3.13) 

where 

1 v 
r t (n, S) = - + X n>t log \Xs h \ + 2r ^- 

n k=l 

Noting that XQ is stationary and ergodic and taking the limit n — > oo on both 
sides of (|3.13p . it follows that 

H(Xs\Xs)<pt(S) + 2rr, (3.14) 



' Specifically, let U\, ■ ■ ■ , U r ; Vi, ■ ■ • , V r be random variables such that p(vi\ui, ■ ■ ■ , u r ) — 
p(vi\ui) (i — 1, •••,r) (channel independence), then I(Ui, ■ ■ ■ ,U r ;Vi, ■ ■ ■ ,V r ) < 
YZ =lt ... tk I(Ui;Vi) (cf. Cover and Thomas 0). 
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where H(Xs\X-g) is the conditional entropy rate and we have noticed that 
^n,t — > as n — > oo. Since r > is arbitrarily small, we have 

H(X s \Xg) < p t (S). (3.15) 

Since t £ is arbitrary, we conclude that 

H(X S \X^) < PM (S). 



2. Direct part: 

Suppose that inequality (13. 1|) holds. It suffices to show that for Rij = dj +r is 
achievable for any small r > (see Definitions 12 . 1\ [2T2]) . To do so, we will use 
below the random coding argument. Before that, we need some preparation. 
First, with sufficiently small 5 > in Definition 12. II we have 

7" 

Cij + - < Rij - 5 = Cij + t - 5 < Cij + t. (3.16) 

The second inequality guarantees that, for each channel Wij, T-capacity Rij = 
c^ + r is enough, with appropriate choice of an encoder (fij and a decoder ipij, 
to attain reliable reproduction of the input of the encoder ipij (i.e., the output 
of fij with domain size 2 n ( Ri 3~ s >) at the decoder ipij with maximum decoding 

error probability 7n > such that 7n —> as n — > oo (cf. e.g., Gallager 
[7], Csiszar and Korner [21] ). On the other hand, the first inequality of (13. 16f> 
will be used later. 

In order to first evaluate the error probability 

let us define the error event: 

E n = {decoding errors are caused by channel coding via some u^-'s}, 
or more formally, 

E n = {hij 7^ fij as functions for some (i, j) € E}, (3-17) 
where /jj's and h^s £ E) have been specified in (12. 4h ~ ()2.9p . Then, 

X n>t = Pr{E n }Pr{X^ <t + X%\E n } + Pr{E n } Pr{lg )t + X%\E n }, (3.18) 
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where E n indicates the complement of E n , i.e., 

E n = {hij = fij as functions for all € E}. (3.19) 

Now define 

En'^ = {hij 7^ fij 38 functions} for G E, (3.20) 

then it is not difficult to check that jn = Pr^n } because 7n is the 
maximum decoding error probability. Moreover, we see that 

E n = U^ eE E^' j > (disjoint union). 

Therefore, 

Xn,t < Pr{X^ )t ^ X%\E n } + Pr{£ n } 

= Y>T{Xl^Xl\E n }+ Y, ^{E { ri' j) } 

= Vx{Xl^Xl\E n }+ l { n 3) 

{t,j)eE 

< Pr{lg )t + Xl\E n } + \E\ ln (3.21) 

with 7„ = max(jj) g £; 7n ' , where the first equality comes from the fact that 
all component channels are independent. It is obvious that 7 n — > as n — » 
oo. 

Thus, in order to demonstrate A nj j — )• 0, it suffices to show that 

f3 ntt = PT{Xl t ^X%\E n }^0 (n-X»), (3.22) 

which means that we may assume throughout in the sequel that all the chan- 
nels in the network are regarded as noiseless (i.e., the identity mappings). 
Accordingly, then, hij = V'ij o o (p^ o reduces to hij = fij with domain 
size 2 n ( Ri i , and consequently h^ = fij , where fij denotes the value of fij as 
a function of x$; similarly for hij. Thus, we can separate channel coding from 
network coding. Hereafter, for this reason, we use only the notation fij, fij 
instead of hij , hij . 

Let us now return to show, in view of Definition 12.21 that (cy + t)^j^ gE 
is achievable for any snail r > 0. To do so, we invoke the random coding 
argument: for each 

z£ Yl [l,2 n{Rki ~ S) ], 

k:(k,i)&E 
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make fij(z) take values uniformly and independently in [1, 2 n ( Ri: >~ S '] (cf. (|2.6p ). 
First, define the associated random variables, as functions of x$ € X£, such 
that 

z s (x$) = x s (s e $), 

Zj(x$) = (fkj(*$))(k,j)eE U & 

It is evident that Zj(x$)'s thus defined carry on all the information received 
at node j during the coding process. 

In the sequel we use the following notation: fix an x$ G Xg and decompose it 
as x$ = (xs,Xg) where (0 ^ S C <3?). We indecate by an x' $ = (x' s ,x^-) 
such that x.' s ^ xs, x^- = x^, where x' s ^ X5 means componentwise unequality, 
i.e., x' s 7^ x s for all s £ S. It should be remarked here that two distinct 
sequences ^ x$ are indistinguishable at the decoder t G f if and only 

if zt(x<j>) = ^(x'^jgj). The proof to be stated below is basically along in the 
same spirit as that of Ahlswede, Cai, Li and Yeung [T], although we need here 
to invoke the joint typicality argument as well as subtle arguments on the 
classification of error patterns. 

Let us now evaluate the probability of decoding error under the encoding 
scheme as specified in Section [2JC. We first fix a typical sequence x$ € 
r A pr$), and for t G * and / S C define 



1 if there exists some x'^^j ^ x$ such that 

x' s is jointly typical with x^ and z t (x$) = z t (x^ 5 j), 
otherwise. 

(3.23) 

Furthermore, set 

F(x*)= max F s , t (x*), (3.24) 

where we notice that -F(x$) = 1 if and only if x$ cannot be uniquely recovered 
by at least one sink node f G 

Here, for any node i € V let Pj denote the set of all the starting nodes of the 
longest directed paths ending at node i, and set 

V = {i e V\<S> fl Vi + 0} and V 1 = V\ V . 

Furthermore, we consider any x^,^ ^ x$ and define 

B = {i E V \zi(x*) + Zi(^ [s] )}, (3.25) 
B 1 = {i G V \zi(x$) = Zi(x' $[s] )}, (3.26) 
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where Bq is the set of nodes i at which two sources x<j> and x^r™ are distin- 
guishable, and B\ UV\ is the set of nodes i at which x$ and x^j are indistin- 
guishable. It is obvious that S C B C V , S C B 1 and B i UV 1 = V\B . 

Now let us fix any x$ and suppose that Zt(x$) = ^(x^gj), which implies 
that t G B\. Then, we see that Bq = N for some N <Z V such that S C N 
and t N, that is, iV is a fixed cut between S and i. Then, for i £ Bq and 

Pr{/^(x$) = /^(x^l^XciO / ^i(x' $[s] )} 

< 2 -«fe J +f) ) (3.27) 

where we have used the first inequality in (|3.16j) . Notice here that Bq,B\ are 
random sets under the random coding for /jj's. Therefore, 

Pr{5 = N} 

= Pr{B = N,B DN} 

= Pv{B = N\B D N} Pr{B D N} 

< Pr{B = N\B D N} 

< II Pr { fij (x$ ) = fij (x' $ [5] ) | * (x # ) / 2j (x$ [5] } 

< 2^ ri( ^^J' e - B iv ClJ+ ^ ) , (3.28) 
where -E^v = {(*>i) S € iV, j G V \ iV}. Furthermore, 

EC;, > min 
J N:ScN,t£N J 

(i,j)eE N 

= Pt(S), (3.29) 
where /Oi(S') was specified in Sectional 

In conclusion, it follows from f|3.28|) and (|3.29p that, for any fixed cut N 
separating S and t, 

Pr{B = N}< 2-™^( s )+5), (3.30) 

so that 

Pr{^(x # ) = z t (x'^ [s] )} 
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= Pt{Bq = N for some cut N between S and t} 
< 2 |V| 2 -nG*(S)+5). (3.31) 

On the other hand, as is seen from the definition of i*s,t(x$) in (|3.23p . condition 
-Fs,t(x$) = 1 is equivalent to the statement "^(x$) = ^(x^r™) for some 
x 4>[5] ^ x * sucn x s ^ s j° m tly typical with Xg." As a consequence, by 
virtue of Lemma 12,21 and (|3.3ip , we obtain 

Pr{F s , t (x*) = 1} < 2 n ^ x ^ x s)+^) Pr{z t (x $ ) = z t (x' m )} 

< 2 |y| 2 n(H(X s |Xs)+2A- Pt (S)-5) 

< 2 |V| 2 -n( Pt (S)-H(X s |Xs)+f) ) ^_ 32 ^ 

where we have chosen A = since A > can be arbitrarily small. Then, in 
view of (|3.24p . it follows that 

Pr{F(x$) = 1} 

= Pr{ max F 5 t(x$) = 1} 

< ]T Pr {^, t (x$) = 1} 

< Y 2 |F| 2~ n(p ' (s) ~ H(Xs|x s)+f) ) (3.33) 

which together with condition (|3.ip yields 

E(F(x$)) = Pr{F(x$) = 1} < 2- cn (for x$ € T A (X$)) (3.34) 

for all sufficiently large n > no, where c = | and -E denotes the expectation 
due to random coding. 

Finally, in order to show the existence of a deterministic code to attain the 
transmissibility over network M = (V,E,C), set 

G n (x*) = E(F(x*)) for x$ € r A (X$), 

and set F(x$) = 1 for x$ ^ T A (X$), then, again by Lemma |2"72"| 

^2 P( x $) G n(x$) = ^ P( x *) G 'n(x$) + ^ P(x$)G n (x$)} 

< £ p(x$)G ri (x $ ) + Pr{Xg ^T A (X$)} 
x*er A (x*) 

< J] P(x$)2- cn + A 
x*eT A (x*) 

< 2~ cn + A. (3.35) 
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On the other hand, the left-hand side of f)3.35j) is rewritten as 

p(x$)G n (x$) 

= E(J2 p(x*)F(x*)) 

= i£( the probability of decoding error via network J\f = (V, E, C)). 

Thus, we have shown that there exists at least one deterministic code with 
probability of decoding error at most 2~ cn + A. 



4 Examples 



In this section we show two examples of Theorem 13.11 with $ = {S1.S2} and 

* = {h.t 2 }. 

Example 1. Consider the network as in Fig. 1 (called the butterfly) where all the 
solid edges have capacity 1 and the independent sources X\ , X 2 are binary and 
uniformly distributed (cited from Yan, Yang and Zhang |22J). The capacity 
function of this network is computed as follows: 

ptAM) = pt 2 ({si}) = i, 

PtiiW}) = Pt 2 ({s2}) = 2, 
Pt 1 ({si,s 2 }) = pt 2 ({si,s 2 }) = 2; 



PAf({si}) = min(p tl ({si}),p t2 ({si})) = 1, 
PM({s2}) = min(p tl ({s 2 }),pt 2 ({s 2 })) = 1, 
PM{{si,s 2 }) = mm(p tl ({sx,s 2 }),p t2 ({si,s 2 })) = 2. 

On the other hand, 

tf(Xi|X 2 ) = H(X 1 ) = 1, 

H(X 2 \X X ) = H(X 2 ) = 1, 

H(X!X 2 ) = H(X 1 ) + H(X 2 ) = 2. 



Therefore, condition (|3.1j) in Theorem 13.11 is satisfied with equality, so that 
the sourse is transmissible over the network. Then, how to attain this trans- 
missibility? That is depicted in Fig. 2 where © denotes the exclusive OR. Fig. 
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3 depicts the corresponding capacity region, which is within the framework of 
the previous work (e.g., see Ahlswede et al. [T]). 



Example 2. Consider the network with noisy channels as in Fig. 4 where the 
solid edges have capacity 1 and the broken edges have capacity h(p) < 1. 
Here, h(p) (0 < p < |) is the binary entropy defined by h(p) = —plog 2 p — 
(1 — p)log 2 (l — p). The source (Xi,X 2 ) generated at the nodes si,s 2 is the 
binary symmetric source with crossover probability p, i.e., 



Notice that X\, X 2 are not independent. The capacity function of this network 
is computed as follows: 



PtAM) = PtaiM) = h(p), 

ptA{si}) = Pt 2 ({s 2 }) = i + Hp), 

Pt 1 ({si,s 2 }) = pt 2 ({si,s 2 }) = 2; 

PMM) = mm(p tl ({s 2 }),p t2 ({s 2 })) = h(p), 
PtfiM) = mm(p tl ({ Sl }),pt 2 {{si})) = h(p), 



PAf{{si,s 2 }) = min(p tl ({si,s 2 }),/?t 2 ({si,S2})) = 2. 
On the other hand, 



Therefore, condition (|3.1H in Theorem 13. II is satisfied with strict inequality, so 
that the source is transmissible over the network. Then, how to attain this 
transmissibility? That is depicted in Fig. 5 where xi,X2 are n independent 
copies of X\,X 2 , respectively, and A is an m x n matrix (m = nh(p) < n). 
Notice that the entropy of xi ©x 2 (componentwise exclusive OR) is nh(p) bits 
and hence it is possible to recover xi©x 2 from j4(xi©X2) (of length m = nh(p)) 
with asymtoticaly negligible probability of decoding error, provided that A is 
appropriately chosen (see Korner and Marton |20|). It should be remarked 
that this example cannot be justified by the previous works such as Ho et al. 
|13j . Ho et al. [H], and Ramamoorthy et al. |15| . because all of them assume 
noiseless channels with capacity of one bit, i.e., this example is outside the 
previous framework. 



Pr{Xi = 1} = Pr{Xi = 0} = Pr{X 2 = 1} = Pr{X 2 = 0} = - 
Pr{X 2 = l\Xi = 0} = Pr{X 2 = 0\X X = 1} = p. 



H(Xi\X 2 ) 
H(X 2 \X 1 ) 



Hp), 
Hp), 
i + Hp). 
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Figure 5: Coding for Example 2 
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5 Alternative Transmissibility Condition 



In this section we demonstrate an alternative transmissibility condition equiva- 
lent to the necessary and sufficient condition (|3.ip given in Theorem 13. 11 

To do so, for each t £ \I/ we define the polyhedron Ct as the set of all nonneg- 
ative rates (R s ; s € $) such that 

^2Ri<Pt(S) (MV5c<f), (5.1) 

where pt(S) is the capacity function as defined in (|2.16p of Section[2l Moreover, 
define the polyhedron as the set of all nonnegative rates (R s ; s£$) such 
that 

H(X s \Xg) <J2^ (MVSC$), (5.2) 

where H(Xs\X-g) is the conditional entropy rate as defined in Section[2j Then, 
we have the following theorem on the transmissibility over the network Af = 
(V,E,C). 

Theorem 5.1 The following two statements are equivalent: 

1) H(X s \Xs) <pm(S) (0^V5C$), (5.3) 

2) ftswH C*^0 (VtEtt). (5.4) 

In order to prove Theorem 15.11 we need the following lemma: 

Lemma 5.1 (Han [3]) Let cr(S), p(S) be a co-polymatroid and a polyma- 
troid, respectively, as defined in Remark 12.31 Then, the necessary and suffi- 
cient condition for the existence of some nonnegative rates (R s ;s G <3?) such 
that 

<r(S) <^Ri< p(S) (0 + VS C $) (5.5) 

is that 

a(S)<p(S) (0^V5c$). (5.6) 

□ 

Proof of Theorem \5.1\ : 

Suppose that (|5.3I ) holds, then, in view of (|2.17p . this implies 

H(X s \Xg) < p t (S) (Vt G + VS C $). (5.7) 
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Since, as was pointed out in Remark [2 .31 cr(S) = H(Xs\X-g) and p(S) = pt(S) 
are a co-polymatroid and a polymatroid, respectively, application of Lemma 
15. II ensures the existence of some nonnegative rates (R s ; s € <3?) such that 



which is nothing but (|5.4]1 . 

Next, suppose that (15.4p holds. This implies (j5.8|) . which in turn implies (15. 7p . 



Remark 5.1 The necessary and sufficient condition of the form (|5.4p appears 
(without the proof) in Ramamoorthy, Jain, Chou and Effros [15] with |<J?| = 
2, = 2, which they call the feasibility. They attribute the sufficiency part 
simply to Ho, Medard, Effros and Koetter [13J with |<3?| = 2, = 1 (also, 
cf. Ho, Medard, Koetter, Karger, Effros, Shi, and Leong [14] with |$| = 
2, = 1), while attributing the necessity part to Han [3J, Barros and Servetto 
[9]. However, notice that all the arguments in [13], [14] ( |13| is included in 
|14j ) can be validated only within the class of stationary memoryless sources 
of integer bit rates and error-free channels (i.e., the identity mappings) all 
with one bit capacity (this restriction is needed to invoke "Menger's theorem" 
in graph theory); while the present paper, without such severe restrictions, 
treats "general" acyclic networks, allowing for general correlated stationary 
ergodic sources as well as general statistically independent channels with each 
satisfying the strong converse property (cf. Lemma l2.ip . Moreover, as long as 
we are concerned also with noisy channels, the way of approaching the problem 
as in |13| . |14j does not work as well, because in this noisy case we have to 
cope with two kinds of error probabilities, one due to error probabilities for 
source coding and the other due to error probabilities for network coding (i.e., 
channel coding) ; thus in the noisy channel case or in the noiseless channel case 
with non-integer capacities and/or i.i.d. sources of non-integer bit rates, |15j 
cannot attribute the sufficiency part of (I5.4p to [13], [14] . 

It should be noted here also that [13] and [14], though demonstrating relevant 
error exponents (the direct part), do not have the converse part. □ 

Remark 5.2 (Separation) Here, the term of separation is used to mean sep- 
aration of distributed source coding and network coding with independent 
sources. Theorem 13 . 1 1 does not immediately guarantee separation in this sense. 
However, when pj\f(S) is, for example, a polymatroid as mentioned in Remark 
12.31 separation in this sense is ensured, because in this case it is guaranteed 





i.e., ([53]) holds. 



□ 
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by Lemma |5. II that there exist some nonnegative rates Ri (i G f) such that 



iT(X 5 |%) < < pm{S) (0 7^ V5 C 



(5.9) 



Then, the first inequality ensures reliable distributed source coding by virtue 
of the theorem of Slepian and Wolf (cf . Cover [5] ) , while the second inequality 
ensures reliable network coding, that looks like for non-physical flows, with 
independent distributed sources of rates Ri (i £ see Remark I3.2() . Fur- 
thermore, in the particular case of |^| = 1, the capacity function pAf(S) is 
always a polymatroid, so separation holds, where network coding looks like 
for physical flows (cf. Han [3], Meggido [23], and Ramamoorthy, Jain, Chou 
and Effros [IS]). Then, it would be natural to ask the question whether sepa- 
rability in this sense implies polymatroidal property. In this connection, [15] 
claims that, in the case with |$»| = = 2 and with rational capacities as well 
as sources of integer bit rates, 11 separation" always holds, irrespective of the 
polymatroidal property, while in the case of [^l > 2 or l^l > 2 no conclusive 
claim is not made. On the other hand, we notice here that condition (|5.9p 
is actually sufficient for separability despite the non-polymatroid property of 
PAf(S)- Condition (|5.9p is equivalently written as 



for any general network N. Moreover, in view of Remark 13.21 it is not difficult 
to check that (|5.10p is also necessary. Thus, our conclusion is that, in general, 
condition (|5.10p is not only sufficient but also necessary for separability. □ 

Remark 5.3 It is possible also to consider network coding with cost. In this 
regard the reader may refer to, e.g., Han [3], Ramamoorthy [27] . Lee et al. 



Remark 5.4 So far we have focused on the case where the channels of a 
network are quite general but are statistically independent. On the other 
hand, we may think of the case where the channels are not necessarily statisti- 
cally independent. This problem is quite hard in general. A typical tractable 
example of such networks would be a class of acyclic deterministic relay net- 
works with no interference (called the Aref network) in which the concept of 
"channel capacity" is irrelevant. In this connection, Ratnakar and Kramer 
[24j have studied Aref networks with a single source and multiple sinks, while 
Korada and Vasudevan [25] have studied Aref networks with multiple corre- 
lated sources and multiple sinks. The network capacity formula as well as the 




(5.10) 




□ 
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network matching formula obtained by them are in nice correspondence with 
the formula obtained by Ahlswede et al. [T] as well as Theorem 13.11 established 
in this paper, respectively. □ 
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